Pesquisa | Biblioteca Virtual em Saúde

1.

Statistical sampling of missing environmental variables improves biophysical genomic prediction in wheat.

Jighly, Abdulqader; Thayalakumaran, Thabo; Kant, Surya; Panozzo, Joe; Aggarwal, Rajat; Hessel, David; Forrest, Kerrie L; Technow, Frank; Totir, Radu; Goddard, Mike; Pryce, Jennie; Hayden, Matthew J; Munkvold, Jesse; O'Leary, Garry J.

Theor Appl Genet ; 137(5): 108, 2024 Apr 18.

Artigo em Inglês | MEDLINE | ID: mdl-38637355

RESUMO

KEY MESSAGE: The integration of genomic prediction with crop growth models enabled the estimation of missing environmental variables which improved the prediction accuracy of grain yield. Since the invention of whole-genome prediction (WGP) more than two decades ago, breeding programmes have established extensive reference populations that are cultivated under diverse environmental conditions. The introduction of the CGM-WGP model, which integrates crop growth models (CGM) with WGP, has expanded the applications of WGP to the prediction of unphenotyped traits in untested environments, including future climates. However, CGMs require multiple seasonal environmental records, unlike WGP, which makes CGM-WGP less accurate when applied to historical reference populations that lack crucial environmental inputs. Here, we investigated the ability of CGM-WGP to approximate missing environmental variables to improve prediction accuracy. Two environmental variables in a wheat CGM, initial soil water content (InitlSoilWCont) and initial nitrate profile, were sampled from different normal distributions separately or jointly in each iteration within the CGM-WGP algorithm. Our results showed that sampling InitlSoilWCont alone gave the best results and improved the prediction accuracy of grain number by 0.07, yield by 0.06 and protein content by 0.03. When using the sampled InitlSoilWCont values as an input for the traditional CGM, the average narrow-sense heritability of the genotype-specific parameters (GSPs) improved by 0.05, with GNSlope, PreAnthRes, and VernSen showing the greatest improvements. Moreover, the root mean square of errors for grain number and yield was reduced by about 7% for CGM and 31% for CGM-WGP when using the sampled InitlSoilWCont values. Our results demonstrate the advantage of sampling missing environmental variables in CGM-WGP to improve prediction accuracy and increase the size of the reference population by enabling the utilisation of historical data that are missing environmental records.

Assuntos

Melhoramento Vegetal , Triticum , Triticum/genética , Genoma , Genômica/métodos , Genótipo , Fenótipo , Grão Comestível/genética , Modelos Genéticos

2.

MetaGS: an accurate method to impute and combine SNP effects across populations using summary statistics.

Jighly, Abdulqader; Benhajali, Haifa; Liu, Zengting; Goddard, Mike E.

Genet Sel Evol ; 54(1): 37, 2022 Jun 02.

Artigo em Inglês | MEDLINE | ID: mdl-35655152

RESUMO

BACKGROUND: Meta-analysis describes a category of statistical methods that aim at combining the results of multiple studies to increase statistical power by exploiting summary statistics. Different industries that use genomic prediction do not share their raw data due to logistic or privacy restrictions, which can limit the size of their reference populations and creates a need for a practical meta-analysis method. RESULTS: We developed a meta-analysis, named MetaGS, that duplicates the results of multi-trait best linear unbiased prediction (mBLUP) analysis without accessing raw data. MetaGS exploits the correlations among different populations to produce more accurate population-specific single nucleotide polymorphism (SNP) effects. The method improves SNP effect estimations for a given population depending on its relations to other populations. MetaGS was tested on milk, fat and protein yield data of Australian Holstein and Jersey cattle and it generated very similar genomic estimated breeding values to those produced using the mBLUP method for all traits in both breeds. One of the major difficulties when combining SNP effects across populations is the use of different variants for the populations, which limits the applications of meta-analysis in practice. We solved this issue by developing a method to impute missing summary statistics without using raw data. Our results showed that imputing summary statistics can be done with high accuracy (r > 0.9) even when more than 70% of the SNPs were missing with a minimal effect on prediction accuracy. CONCLUSIONS: We demonstrated that MetaGS can replace the mBLUP model when raw data cannot be shared, which can lead to more flexible collaborations compared to the single-trait BLUP model.

Assuntos

Genômica , Polimorfismo de Nucleotídeo Único , Animais , Austrália , Bovinos/genética , Genoma , Genômica/métodos , Fenótipo

3.

On the use of whole-genome sequence data for across-breed genomic prediction and fine-scale mapping of QTL.

Meuwissen, Theo; van den Berg, Irene; Goddard, Mike.

Genet Sel Evol ; 53(1): 19, 2021 Feb 26.

Artigo em Inglês | MEDLINE | ID: mdl-33637049

RESUMO

BACKGROUND: Whole-genome sequence (WGS) data are increasingly available on large numbers of individuals in animal and plant breeding and in human genetics through second-generation resequencing technologies, 1000 genomes projects, and large-scale genotype imputation from lower marker densities. Here, we present a computationally fast implementation of a variable selection genomic prediction method, that could handle WGS data on more than 35,000 individuals, test its accuracy for across-breed predictions and assess its quantitative trait locus (QTL) mapping precision. METHODS: The Monte Carlo Markov chain (MCMC) variable selection model (Bayes GC) fits simultaneously a genomic best linear unbiased prediction (GBLUP) term, i.e. a polygenic effect whose correlations are described by a genomic relationship matrix (G), and a Bayes C term, i.e. a set of single nucleotide polymorphisms (SNPs) with large effects selected by the model. Computational speed is improved by a Metropolis-Hastings sampling that directs computations to the SNPs, which are, a priori, most likely to be included into the model. Speed is also improved by running many relatively short MCMC chains. Memory requirements are reduced by storing the genotype matrix in binary form. The model was tested on a WGS dataset containing Holstein, Jersey and Australian Red cattle. The data contained 4,809,520 genotypes on 35,549 individuals together with their milk, fat and protein yields, and fat and protein percentage traits. RESULTS: The prediction accuracies of the Jersey individuals improved by 1.5% when using across-breed GBLUP compared to within-breed predictions. Using WGS instead of 600 k SNP-chip data yielded on average a 3% accuracy improvement for Australian Red cows. QTL were fine-mapped by locating the SNP with the highest posterior probability of being included in the model. Various QTL known from the literature were rediscovered, and a new SNP affecting milk production was discovered on chromosome 20 at 34.501126 Mb. Due to the high mapping precision, it was clear that many of the discovered QTL were the same across the five dairy traits. CONCLUSIONS: Across-breed Bayes GC genomic prediction improved prediction accuracies compared to GBLUP. The combination of across-breed WGS data and Bayesian genomic prediction proved remarkably effective for the fine-mapping of QTL.

Assuntos

Cruzamento/métodos , Bovinos/genética , Estudo de Associação Genômica Ampla/métodos , Locos de Características Quantitativas , Sequenciamento Completo do Genoma/métodos , Animais , Feminino , Masculino , Produtos da Carne/normas , Característica Quantitativa Herdável

4.

Meta-analysis for milk fat and protein percentage using imputed sequence variant genotypes in 94,321 cattle from eight cattle breeds.

van den Berg, Irene; Xiang, Ruidong; Jenko, Janez; Pausch, Hubert; Boussaha, Mekki; Schrooten, Chris; Tribout, Thierry; Gjuvsland, Arne B; Boichard, Didier; Nordbø, Øyvind; Sanchez, Marie-Pierre; Goddard, Mike E.

Genet Sel Evol ; 52(1): 37, 2020 Jul 07.

Artigo em Inglês | MEDLINE | ID: mdl-32635893

RESUMO

BACKGROUND: Sequence-based genome-wide association studies (GWAS) provide high statistical power to identify candidate causal mutations when a large number of individuals with both sequence variant genotypes and phenotypes is available. A meta-analysis combines summary statistics from multiple GWAS and increases the power to detect trait-associated variants without requiring access to data at the individual level of the GWAS mapping cohorts. Because linkage disequilibrium between adjacent markers is conserved only over short distances across breeds, a multi-breed meta-analysis can improve mapping precision. RESULTS: To maximise the power to identify quantitative trait loci (QTL), we combined the results of nine within-population GWAS that used imputed sequence variant genotypes of 94,321 cattle from eight breeds, to perform a large-scale meta-analysis for fat and protein percentage in cattle. The meta-analysis detected (p ≤ 10-8) 138 QTL for fat percentage and 176 QTL for protein percentage. This was more than the number of QTL detected in all within-population GWAS together (124 QTL for fat percentage and 104 QTL for protein percentage). Among all the lead variants, 100 QTL for fat percentage and 114 QTL for protein percentage had the same direction of effect in all within-population GWAS. This indicates either persistence of the linkage phase between the causal variant and the lead variant across breeds or that some of the lead variants might indeed be causal or tightly linked with causal variants. The percentage of intergenic variants was substantially lower for significant variants than for non-significant variants, and significant variants had mostly moderate to high minor allele frequencies. Significant variants were also clustered in genes that are known to be relevant for fat and protein percentages in milk. CONCLUSIONS: Our study identified a large number of QTL associated with fat and protein percentage in dairy cattle. We demonstrated that large-scale multi-breed meta-analysis reveals more QTL at the nucleotide resolution than within-population GWAS. Significant variants were more often located in genic regions than non-significant variants and a large part of them was located in potentially regulatory regions.

Assuntos

Bovinos/genética , Genótipo , Desequilíbrio de Ligação , Lipídeos/genética , Proteínas do Leite/genética , Leite/normas , Animais , Frequência do Gene , Leite/metabolismo , Polimorfismo Genético , Locos de Características Quantitativas

5.

Strong selection pressures maintain divergence on genomic islands in Atlantic cod (Gadus morhua L.) populations.

Rodríguez-Ramilo, Silvia T; Baranski, Matthew; Moghadam, Hooman; Grove, Harald; Lien, Sigbjørn; Goddard, Mike E; Meuwissen, Theo H E; Sonesson, Anna K.

Genet Sel Evol ; 51(1): 61, 2019 Oct 29.

Artigo em Inglês | MEDLINE | ID: mdl-31664896

RESUMO

BACKGROUND: Two distinct populations have been extensively studied in Atlantic cod (Gadus morhua L.): the Northeast Arctic cod (NEAC) population and the coastal cod (CC) population. The objectives of the current study were to identify genomic islands of divergence and to propose an approach to quantify the strength of selection pressures using whole-genome single nucleotide polymorphism (SNP) data. After applying filtering criteria, information on 93 animals (9 CC individuals, 50 NEAC animals and 34 CC × NEAC crossbred individuals) and 3,123,434 autosomal SNPs were used. RESULTS: Four genomic islands of divergence were identified on chromosomes 1, 2, 7 and 12, which were mapped accurately based on SNP data and which extended in size from 11 to 18 Mb. These regions differed considerably between the two populations although the differences in the rest of the genome were small due to considerable gene flow between the populations. The estimates of selection pressures showed that natural selection was substantially more important than genetic drift in shaping these genomic islands. Our data confirmed results from earlier publications that suggested that genomic islands are due to chromosomal rearrangements that are under strong selection and reduce recombination between rearranged and non-rearranged segments. CONCLUSIONS: Our findings further support the hypothesis that selection and reduced recombination in genomic islands may promote speciation between these two populations although their habitats overlap considerably and migrations occur between them.

Assuntos

Gadus morhua/genética , Ilhas Genômicas , Polimorfismo de Nucleotídeo Único , Seleção Genética , Animais , Cromossomos/genética , Fluxo Gênico , Deriva Genética , Recombinação Genética

6.

Putative bovine topological association domains and CTCF binding motifs can reduce the search space for causative regulatory variants of complex traits.

Wang, Min; Hancock, Timothy P; Chamberlain, Amanda J; Vander Jagt, Christy J; Pryce, Jennie E; Cocks, Benjamin G; Goddard, Mike E; Hayes, Benjamin J.

BMC Genomics ; 19(1): 395, 2018 May 24.

Artigo em Inglês | MEDLINE | ID: mdl-29793448

RESUMO

BACKGROUND: Topological association domains (TADs) are chromosomal domains characterised by frequent internal DNA-DNA interactions. The transcription factor CTCF binds to conserved DNA sequence patterns called CTCF binding motifs to either prohibit or facilitate chromosomal interactions. TADs and CTCF binding motifs control gene expression, but they are not yet well defined in the bovine genome. In this paper, we sought to improve the annotation of bovine TADs and CTCF binding motifs, and assess whether the new annotation can reduce the search space for cis-regulatory variants. RESULTS: We used genomic synteny to map TADs and CTCF binding motifs from humans, mice, dogs and macaques to the bovine genome. We found that our mapped TADs exhibited the same hallmark properties of those sourced from experimental data, such as housekeeping genes, transfer RNA genes, CTCF binding motifs, short interspersed elements, H3K4me3 and H3K27ac. We showed that runs of genes with the same pattern of allele-specific expression (ASE) (either favouring paternal or maternal allele) were often located in the same TAD or between the same conserved CTCF binding motifs. Analyses of variance showed that when averaged across all bovine tissues tested, TADs explained 14% of ASE variation (standard deviation, SD: 0.056), while CTCF explained 27% (SD: 0.078). Furthermore, we showed that the quantitative trait loci (QTLs) associated with gene expression variation (eQTLs) or ASE variation (aseQTLs), which were identified from mRNA transcripts from 141 lactating cows' white blood and milk cells, were highly enriched at putative bovine CTCF binding motifs. The linearly-furthermost, and most-significant aseQTL and eQTL for each genic target were located within the same TAD as the gene more often than expected (Chi-Squared test P-value < 0.001). CONCLUSIONS: Our results suggest that genomic synteny can be used to functionally annotate conserved transcriptional components, and provides a tool to reduce the search space for causative regulatory variants in the bovine genome.

Assuntos

Fator de Ligação a CCCTC/metabolismo , Genômica , Motivos de Nucleotídeos , Animais , Bovinos , Ligação Proteica , Locos de Características Quantitativas/genética

7.

Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effect.

van den Berg, Irene; Bowman, Phil J; MacLeod, Iona M; Hayes, Ben J; Wang, Tingting; Bolormaa, Sunduimijid; Goddard, Mike E.

Genet Sel Evol ; 49(1): 70, 2017 09 21.

Artigo em Inglês | MEDLINE | ID: mdl-28934948

RESUMO

BACKGROUND: The increasing availability of whole-genome sequence data is expected to increase the accuracy of genomic prediction. However, results from simulation studies and analysis of real data do not always show an increase in accuracy from sequence data compared to high-density (HD) single nucleotide polymorphism (SNP) chip genotypes. In addition, the sheer number of variants makes analysis of all variants and accurate estimation of all effects computationally challenging. Our objective was to find a strategy to approximate the analysis of whole-sequence data with a Bayesian variable selection model. Using a simulated dataset, we applied a Bayes R hybrid model to analyse whole-sequence data, test the effect of dropping a proportion of variants during the analysis, and test how the analysis can be split into separate analyses per chromosome to reduce the elapsed computing time. We also investigated the effect of imputation errors on prediction accuracy. Subsequently, we applied the approach to a dataset that contained imputed sequences and records for production and fertility traits for 38,492 Holstein, Jersey, Australian Red and crossbred bulls and cows. RESULTS: With the simulated dataset, we found that prediction accuracy was highly increased for a breed that was not represented in the training population for sequence data compared to HD SNP data. Either dropping part of the variants during the analysis or splitting the analysis into separate analyses per chromosome decreased accuracy compared to analysing whole-sequence data. First, dropping variants from each chromosome and reanalysing the retained variants together resulted in an accuracy similar to that obtained when analysing whole-sequence data. Adding imputation errors decreased prediction accuracy, especially for errors in the validation population. With real data, using sequence variants resulted in accuracies that were similar to those obtained with the HD SNPs. CONCLUSIONS: We present an efficient approach to approximate analysis of whole-sequence data with a Bayesian variable selection model. The lack of increase in prediction accuracy when applied to real data could be due to imputation errors, which demonstrates the importance of developing more accurate methods of imputation or directly genotyping sequence variants that have a major effect in the prediction equation.

Assuntos

Cruzamento , Bovinos/genética , Genômica/métodos , Modelos Genéticos , Animais , Austrália , Teorema de Bayes , Bases de Dados Genéticas , Feminino , Genótipo , Masculino , Polimorfismo de Nucleotídeo Único

8.

Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle.

Daetwyler, Hans D; Capitan, Aurélien; Pausch, Hubert; Stothard, Paul; van Binsbergen, Rianne; Brøndum, Rasmus F; Liao, Xiaoping; Djari, Anis; Rodriguez, Sabrina C; Grohs, Cécile; Esquerré, Diane; Bouchez, Olivier; Rossignol, Marie-Noëlle; Klopp, Christophe; Rocha, Dominique; Fritz, Sébastien; Eggen, André; Bowman, Phil J; Coote, David; Chamberlain, Amanda J; Anderson, Charlotte; VanTassell, Curt P; Hulsegge, Ina; Goddard, Mike E; Guldbrandtsen, Bernt; Lund, Mogens S; Veerkamp, Roel F; Boichard, Didier A; Fries, Ruedi; Hayes, Ben J.

Nat Genet ; 46(8): 858-65, 2014 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-25017103

RESUMO

The 1000 bull genomes project supports the goal of accelerating the rates of genetic gain in domestic cattle while at the same time considering animal health and welfare by providing the annotated sequence variants and genotypes of key ancestor bulls. In the first phase of the 1000 bull genomes project, we sequenced the whole genomes of 234 cattle to an average of 8.3-fold coverage. This sequencing includes data for 129 individuals from the global Holstein-Friesian population, 43 individuals from the Fleckvieh breed and 15 individuals from the Jersey breed. We identified a total of 28.3 million variants, with an average of 1.44 heterozygous sites per kilobase for each individual. We demonstrate the use of this database in identifying a recessive mutation underlying embryonic death and a dominant mutation underlying lethal chrondrodysplasia. We also performed genome-wide association studies for milk production and curly coat, using imputed sequence variants, and identified variants associated with these traits in cattle.

Assuntos

Bovinos/genética , Genoma , Sequência de Aminoácidos , Animais , Estudo de Associação Genômica Ampla/métodos , Genótipo , Masculino , Dados de Sequência Molecular , Polimorfismo de Nucleotídeo Único , Homologia de Sequência de Aminoácidos

9.

Detection of quantitative trait loci in Bos indicus and Bos taurus cattle using genome-wide association studies.

Bolormaa, Sunduimijid; Pryce, Jennie E; Kemper, Kathryn E; Hayes, Ben J; Zhang, Yuandan; Tier, Bruce; Barendse, William; Reverter, Antonio; Goddard, Mike E.

Genet Sel Evol ; 45: 43, 2013 Oct 29.

Artigo em Inglês | MEDLINE | ID: mdl-24168700

RESUMO

BACKGROUND: The apparent effect of a single nucleotide polymorphism (SNP) on phenotype depends on the linkage disequilibrium (LD) between the SNP and a quantitative trait locus (QTL). However, the phase of LD between a SNP and a QTL may differ between Bos indicus and Bos taurus because they diverged at least one hundred thousand years ago. Here, we test the hypothesis that the apparent effect of a SNP on a quantitative trait depends on whether the SNP allele is inherited from a Bos taurus or Bos indicus ancestor. METHODS: Phenotype data on one or more traits and SNP genotype data for 10 181 cattle from Bos taurus, Bos indicus and composite breeds were used. All animals had genotypes for 729 068 SNPs (real or imputed). Chromosome segments were classified as originating from B. indicus or B. taurus on the basis of the haplotype of SNP alleles they contained. Consequently, SNP alleles were classified according to their sub-species origin. Three models were used for the association study: (1) conventional GWAS (genome-wide association study), fitting a single SNP effect regardless of subspecies origin, (2) interaction GWAS, fitting an interaction between SNP and subspecies-origin, and (3) best variable GWAS, fitting the most significant combination of SNP and sub-species origin. RESULTS: Fitting an interaction between SNP and subspecies origin resulted in more significant SNPs (i.e. more power) than a conventional GWAS. Thus, the effect of a SNP depends on the subspecies that the allele originates from. Also, most QTL segregated in only one subspecies, suggesting that many mutations that affect the traits studied occurred after divergence of the subspecies or the mutation became fixed or was lost in one of the subspecies. CONCLUSIONS: The results imply that GWAS and genomic selection could gain power by distinguishing SNP alleles based on their subspecies origin, and that only few QTL segregate in both B. indicus and B. taurus cattle. Thus, the QTL that segregate in current populations likely resulted from mutations that occurred in one of the subspecies and can have both positive and negative effects on the traits. There was no evidence that selection has increased the frequency of alleles that increase body weight.

Assuntos

Bovinos/classificação , Bovinos/genética , Estudo de Associação Genômica Ampla/métodos , Locos de Características Quantitativas , Alelos , Animais , Peso Corporal/genética , Cruzamento , Cromossomos , Frequência do Gene , Variação Genética , Genoma , Genótipo , Crescimento/genética , Haplótipos , Fenótipo , Polimorfismo de Nucleotídeo Único , Seleção Genética , Especificidade da Espécie

10.

Inferring demography from runs of homozygosity in whole-genome sequence, with correction for sequence errors.

MacLeod, Iona M; Larkin, Denis M; Lewin, Harris A; Hayes, Ben J; Goddard, Mike E.

Mol Biol Evol ; 30(9): 2209-23, 2013 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-23842528

RESUMO

Whole-genome sequence is potentially the richest source of genetic data for inferring ancestral demography. However, full sequence also presents significant challenges to fully utilize such large data sets and to ensure that sequencing errors do not introduce bias into the inferred demography. Using whole-genome sequence data from two Holstein cattle, we demonstrate a new method to correct for bias caused by hidden errors and then infer stepwise changes in ancestral demography up to present. There was a strong upward bias in estimates of recent effective population size (Ne) if the correction method was not applied to the data, both for our method and the Li and Durbin (Inference of human population history from individual whole-genome sequences. Nature 475:493-496) pairwise sequentially Markovian coalescent method. To infer demography, we use an analytical predictor of multiloci linkage disequilibrium (LD) based on a simple coalescent model that allows for changes in Ne. The LD statistic summarizes the distribution of runs of homozygosity for any given demography. We infer a best fit demography as one that predicts a match with the observed distribution of runs of homozygosity in the corrected sequence data. We use multiloci LD because it potentially holds more information about ancestral demography than pairwise LD. The inferred demography indicates a strong reduction in the Ne around 170,000 years ago, possibly related to the divergence of African and European Bos taurus cattle. This is followed by a further reduction coinciding with the period of cattle domestication, with Ne of between 3,500 and 6,000. The most recent reduction of Ne to approximately 100 in the Holstein breed agrees well with estimates from pedigrees. Our approach can be applied to whole-genome sequence from any diploid species and can be scaled up to use sequence from multiple individuals.

Assuntos

Bovinos , Genética Populacional , Genoma , Homozigoto , Filogenia , Animais , Bovinos/classificação , Bovinos/genética , Feminino , Loci Gênicos , Haplótipos , Desequilíbrio de Ligação , Masculino , Cadeias de Markov , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Densidade Demográfica , Análise de Sequência de DNA , Fatores de Tempo

11.

Accelerating improvement of livestock with genomic selection.

Meuwissen, Theo; Hayes, Ben; Goddard, Mike.

Annu Rev Anim Biosci ; 1: 221-37, 2013 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-25387018

RESUMO

Three recent breakthroughs have resulted in the current widespread use of DNA information: the genomic selection (GS) methodology, which is a form of marker-assisted selection on a genome-wide scale, and the discovery of large numbers of single-nucleotide markers and cost effective methods to genotype them. GS estimates the effect of thousands of DNA markers simultaneously. Nonlinear estimation methods yield higher accuracy, especially for traits with major genes. The marker effects are estimated in a genotyped and phenotyped training population and are used for the estimation of breeding values of selection candidates by combining their genotypes with the estimated marker effects. The benefits of GS are greatest when selection is for traits that are not themselves recorded on the selection candidates before they can be selected. In the future, genome sequence data may replace SNP genotypes as markers. This could increase GS accuracy because the causative mutations should be included in the data.

Assuntos

Cruzamento , Gado/genética , Seleção Genética , Animais , Marcadores Genéticos , Variação Genética , Genoma

12.

Understanding and predicting complex traits: knowledge from cattle.

Kemper, Kathryn E; Goddard, Mike E.

Hum Mol Genet ; 21(R1): R45-51, 2012 Oct 15.

Artigo em Inglês | MEDLINE | ID: mdl-22899652

RESUMO

The genetic architecture of complex traits in cattle includes very large numbers of loci affecting any given trait. Most of these loci have small effects but occasionally there are loci with moderate-to-large effects segregating due to recent selection for the mutant allele. Genomic markers capture most but not all of the additive genetic variance for traits, probably because there are causal mutations with low allele frequency and therefore in incomplete linkage disequilibrium with the markers. The prediction of genetic value from genomic markers can achieve high accuracy by using statistical models that include all markers and assuming that marker effects are random variables drawn from a specified prior distribution. Recent effective population size is in the order of 100 within cattle breeds and ≈ 2500 animals with genotypes and phenotypes are sufficient to predict the genetic value of animals with an accuracy of 0.65. Recent effective population size for humans is much larger, in the order of 10,000-15,000, and more than 145,000 records would be required to reach a similar accuracy for people. However, our calculations assume that genomic markers capture all the genetic variance. This may be possible in the future as causal polymorphisms are genotyped using genome sequence data.

Assuntos

Bovinos/genética , Mapeamento Cromossômico , Padrões de Herança , Locos de Características Quantitativas , Característica Quantitativa Herdável , Animais , Marcadores Genéticos , Variação Genética , Genoma , Genômica , Genótipo , Desequilíbrio de Ligação , Mutação , Fenótipo

13.

Imputation of missing genotypes from sparse to high density using long-range phasing.

Daetwyler, Hans D; Wiggans, George R; Hayes, Ben J; Woolliams, John A; Goddard, Mike E.

Genetics ; 189(1): 317-27, 2011 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-21705746

RESUMO

Related individuals share potentially long chromosome segments that trace to a common ancestor. We describe a phasing algorithm (ChromoPhase) that utilizes this characteristic of finite populations to phase large sections of a chromosome. In addition to phasing, our method imputes missing genotypes in individuals genotyped at lower marker density when more densely genotyped relatives are available. ChromoPhase uses a pedigree to collect an individual's (the proband) surrogate parents and offspring and uses genotypic similarity to identify its genomic surrogates. The algorithm then cycles through the relatives and genomic surrogates one at a time to find shared chromosome segments. Once a segment has been identified, any missing information in the proband is filled in with information from the relative. We tested ChromoPhase in a simulated population consisting of 400 individuals at a marker density of 1500/M, which is approximately equivalent to a 50K bovine single nucleotide polymorphism chip. In simulated data, 99.9% loci were correctly phased and, when imputing from 100 to 1500 markers, more than 87% of missing genotypes were correctly imputed. Performance increased when the number of generations available in the pedigree increased, but was reduced when the sparse genotype contained fewer loci. However, in simulated data, ChromoPhase correctly imputed at least 12% more genotypes than fastPHASE, depending on sparse marker density. We also tested the algorithm in a real Holstein cattle data set to impute 50K genotypes in animals with a sparse 3K genotype. In these data 92% of genotypes were correctly imputed in animals with a genotyped sire. We evaluated the accuracy of genomic predictions with the dense, sparse, and imputed simulated data sets and show that the reduction in genomic evaluation accuracy is modest even with imperfectly imputed genotype data. Our results demonstrate that imputation of missing genotypes, and potentially full genome sequence, using long-range phasing is feasible.

Assuntos

Genótipo , Polimorfismo de Nucleotídeo Único , Alelos , Animais , Bovinos , Troca Genética , Feminino , Deriva Genética , Marcadores Genéticos , Genética Populacional , Estudo de Associação Genômica Ampla , Masculino , Locos de Características Quantitativas

14.

Genome-wide association and genomic selection in animal breeding.

Hayes, Ben; Goddard, Mike.

Genome ; 53(11): 876-83, 2010 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-21076503

RESUMO

Results from genome-wide association studies in livestock, and humans, has lead to the conclusion that the effect of individual quantitative trait loci (QTL) on complex traits, such as yield, are likely to be small; therefore, a large number of QTL are necessary to explain genetic variation in these traits. Given this genetic architecture, gains from marker-assisted selection (MAS) programs using only a small number of DNA markers to trace a limited number of QTL is likely to be small. This has lead to the development of alternative technology for using the available dense single nucleotide polymorphism (SNP) information, called genomic selection. Genomic selection uses a genome-wide panel of dense markers so that all QTL are likely to be in linkage disequilibrium with at least one SNP. The genomic breeding values are predicted to be the sum of the effect of these SNPs across the entire genome. In dairy cattle breeding, the accuracy of genomic estimated breeding values (GEBV) that can be achieved and the fact that these are available early in life have lead to rapid adoption of the technology. Here, we discuss the design of experiments necessary to achieve accurate prediction of GEBV in future generations in terms of the number of markers necessary and the size of the reference population where marker effects are estimated. We also present a simple method for implementing genomic selection using a genomic relationship matrix. Future challenges discussed include using whole genome sequence data to improve the accuracy of genomic selection and management of inbreeding through genomic relationships.

Assuntos

Cruzamento , Estudo de Associação Genômica Ampla , Genoma , Gado/genética , Seleção Genética/genética , Animais , Genótipo , Desequilíbrio de Ligação , Locos de Características Quantitativas

15.

Genetic architecture of complex traits and accuracy of genomic prediction: coat colour, milk-fat percentage, and type in Holstein cattle as contrasting model traits.

Hayes, Ben J; Pryce, Jennie; Chamberlain, Amanda J; Bowman, Phil J; Goddard, Mike E.

PLoS Genet ; 6(9): e1001139, 2010 Sep 23.

Artigo em Inglês | MEDLINE | ID: mdl-20927186

RESUMO

Prediction of genetic merit using dense SNP genotypes can be used for estimation of breeding values for selection of livestock, crops, and forage species; for prediction of disease risk; and for forensics. The accuracy of these genomic predictions depends in part on the genetic architecture of the trait, in particular number of loci affecting the trait and distribution of their effects. Here we investigate the difference among three traits in distribution of effects and the consequences for the accuracy of genomic predictions. Proportion of black coat colour in Holstein cattle was used as one model complex trait. Three loci, KIT, MITF, and a locus on chromosome 8, together explain 24% of the variation of proportion of black. However, a surprisingly large number of loci of small effect are necessary to capture the remaining variation. A second trait, fat concentration in milk, had one locus of large effect and a host of loci with very small effects. Both these distributions of effects were in contrast to that for a third trait, an index of scores for a number of aspects of cow confirmation ("overall type"), which had only loci of small effect. The differences in distribution of effects among the three traits were quantified by estimating the distribution of variance explained by chromosome segments containing 50 SNPs. This approach was taken to account for the imperfect linkage disequilibrium between the SNPs and the QTL affecting the traits. We also show that the accuracy of predicting genetic values is higher for traits with a proportion of large effects (proportion black and fat percentage) than for a trait with no loci of large effect (overall type), provided the method of analysis takes advantage of the distribution of loci effects.

Assuntos

Bovinos/genética , Genoma/genética , Genômica/métodos , Lipídeos/química , Leite/química , Característica Quantitativa Herdável , Pigmentação da Pele/genética , Animais , Cruzamento , Cromossomos de Mamíferos/genética , Estudo de Associação Genômica Ampla , Masculino , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética , Reprodutibilidade dos Testes

16.

Copy number variation and transposable elements feature in recent, ongoing adaptation at the Cyp6g1 locus.

Schmidt, Joshua M; Good, Robert T; Appleton, Belinda; Sherrard, Jayne; Raymant, Greta C; Bogwitz, Michael R; Martin, Jon; Daborn, Phillip J; Goddard, Mike E; Batterham, Philip; Robin, Charles.

PLoS Genet ; 6(6): e1000998, 2010 Jun 24.

Artigo em Inglês | MEDLINE | ID: mdl-20585622

RESUMO

The increased transcription of the Cyp6g1 gene of Drosophila melanogaster, and consequent resistance to insecticides such as DDT, is a widely cited example of adaptation mediated by cis-regulatory change. A fragment of an Accord transposable element inserted upstream of the Cyp6g1 gene is causally associated with resistance and has spread to high frequencies in populations around the world since the 1940s. Here we report the existence of a natural allelic series at this locus of D. melanogaster, involving copy number variation of Cyp6g1, and two additional transposable element insertions (a P and an HMS-Beagle). We provide evidence that this genetic variation underpins phenotypic variation, as the more derived the allele, the greater the level of DDT resistance. Tracking the spatial and temporal patterns of allele frequency changes indicates that the multiple steps of the allelic series are adaptive. Further, a DDT association study shows that the most resistant allele, Cyp6g1-[BP], is greatly enriched in the top 5% of the phenotypic distribution and accounts for approximately 16% of the underlying phenotypic variation in resistance to DDT. In contrast, copy number variation for another candidate resistance gene, Cyp12d1, is not associated with resistance. Thus the Cyp6g1 locus is a major contributor to DDT resistance in field populations, and evolution at this locus features multiple adaptive steps occurring in rapid succession.

Assuntos

Sistema Enzimático do Citocromo P-450/genética , Variações do Número de Cópias de DNA , Elementos de DNA Transponíveis , Proteínas de Drosophila/genética , Drosophila melanogaster/genética , Adaptação Biológica , Alelos , Animais , Animais Geneticamente Modificados , Loci Gênicos , Transcrição Gênica

17.

The use of family relationships and linkage disequilibrium to impute phase and missing genotypes in up to whole-genome sequence density genotypic data.

Meuwissen, Theo; Goddard, Mike.

Genetics ; 185(4): 1441-9, 2010 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-20479147

RESUMO

A novel method, called linkage disequilibrium multilocus iterative peeling (LDMIP), for the imputation of phase and missing genotypes is developed. LDMIP performs an iterative peeling step for every locus, which accounts for the family data, and uses a forward-backward algorithm to accumulate information across loci. Marker similarity between haplotype pairs is used to impute possible missing genotypes and phases, which relies on the linkage disequilibrium between closely linked markers. After this imputation step, the combined iterative peeling/forward-backward algorithm is applied again, until convergence. The calculations per iteration scale linearly with number of markers and number of individuals in the pedigree, which makes LDMIP well suited to large numbers of markers and/or large numbers of individuals. Per iteration calculations scale quadratically with the number of alleles, which implies biallelic markers are preferred. In a situation with up to 15% randomly missing genotypes, the error rate of the imputed genotypes was <1% and approximately 99% of the missing genotypes were imputed. In another example, LDMIP was used to impute whole-genome sequence data consisting of 17,321 SNPs on a chromosome. Imputation of the sequence was based on the information of 20 (re)sequenced founder individuals and genotyping their descendants for a panel of 3000 SNPs. The error rate of the imputed SNP genotypes was 10%. However, if the parents of these 20 founders are also sequenced, >99% of missing genotypes are imputed correctly.

Assuntos

Algoritmos , Genoma/genética , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único , Animais , Mapeamento Cromossômico , Simulação por Computador , Saúde da Família , Feminino , Genótipo , Haplótipos , Humanos , Masculino , Modelos Genéticos , Linhagem

18.

Accurate prediction of genetic values for complex traits by whole-genome resequencing.

Meuwissen, Theo; Goddard, Mike.

Genetics ; 185(2): 623-31, 2010 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-20308278

RESUMO

Whole-genome resequencing technology has improved rapidly during recent years and is expected to improve further such that the sequencing of an entire human genome sequence for $1000 is within reach. Our main aim here is to use whole-genome sequence data for the prediction of genetic values of individuals for complex traits and to explore the accuracy of such predictions. This is relevant for the fields of plant and animal breeding and, in human genetics, for the prediction of an individual's risk for complex diseases. Here, population history and genomic architectures were simulated under the Wright-Fisher population and infinite-sites mutation model, and prediction of genetic value was by the genomic selection approach, where a Bayesian nonlinear model was used to predict the effects of individual SNPs. The Bayesian model assumed a priori that only few SNPs are causative, i.e., have an effect different from zero. When using whole-genome sequence data, accuracies of prediction of genetic value were >40% increased relative to the use of dense approximately 30K SNP chips. At equal high density, the inclusion of the causative mutations yielded an extra increase of accuracy of 2.5-3.7%. Predictions of genetic value remained accurate even when the training and evaluation data were 10 generations apart. Best linear unbiased prediction (BLUP) of SNP effects does not take full advantage of the genome sequence data, and nonlinear predictions, such as the Bayesian method used here, are needed to achieve maximum accuracy. On the basis of theoretical work, the results could be extended to more realistic genome and population sizes.

Assuntos

Genoma Humano , Genoma/genética , Teorema de Bayes , Mapeamento Cromossômico , Genes , Testes Genéticos , Humanos , Mutação , Polimorfismo de Nucleotídeo Único , Densidade Demográfica

19.

Accuracy of genomic breeding values in multi-breed dairy cattle populations.

Hayes, Ben J; Bowman, Phillip J; Chamberlain, Amanda C; Verbyla, Klara; Goddard, Mike E.

Genet Sel Evol ; 41: 51, 2009 Nov 24.

Artigo em Inglês | MEDLINE | ID: mdl-19930712

RESUMO

BACKGROUND: Two key findings from genomic selection experiments are 1) the reference population used must be very large to subsequently predict accurate genomic estimated breeding values (GEBV), and 2) prediction equations derived in one breed do not predict accurate GEBV when applied to other breeds. Both findings are a problem for breeds where the number of individuals in the reference population is limited. A multi-breed reference population is a potential solution, and here we investigate the accuracies of GEBV in Holstein dairy cattle and Jersey dairy cattle when the reference population is single breed or multi-breed. The accuracies were obtained both as a function of elements of the inverse coefficient matrix and from the realised accuracies of GEBV. METHODS: Best linear unbiased prediction with a multi-breed genomic relationship matrix (GBLUP) and two Bayesian methods (BAYESA and BAYES_SSVS) which estimate individual SNP effects were used to predict GEBV for 400 and 77 young Holstein and Jersey bulls respectively, from a reference population of 781 and 287 Holstein and Jersey bulls, respectively. Genotypes of 39,048 SNP markers were used. Phenotypes in the reference population were de-regressed breeding values for production traits. For the GBLUP method, expected accuracies calculated from the diagonal of the inverse of coefficient matrix were compared to realised accuracies. RESULTS: When GBLUP was used, expected accuracies from a function of elements of the inverse coefficient matrix agreed reasonably well with realised accuracies calculated from the correlation between GEBV and EBV in single breed populations, but not in multi-breed populations. When the Bayesian methods were used, realised accuracies of GEBV were up to 13% higher when the multi-breed reference population was used than when a pure breed reference was used. However no consistent increase in accuracy across traits was obtained. CONCLUSION: Predicting genomic breeding values using a genomic relationship matrix is an attractive approach to implement genomic selection as expected accuracies of GEBV can be readily derived. However in multi-breed populations, Bayesian approaches give higher accuracies for some traits. Finally, multi-breed reference populations will be a valuable resource to fine map QTL.

Assuntos

Cruzamento , Bovinos/genética , Genoma , Animais , Feminino , Genótipo , Masculino , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas

20.

A validated genome wide association study to breed cattle adapted to an environment altered by climate change.

Hayes, Ben J; Bowman, Phil J; Chamberlain, Amanda J; Savin, Keith; van Tassell, Curt P; Sonstegard, Tad S; Goddard, Mike E.

PLoS One ; 4(8): e6676, 2009 Aug 18.

Artigo em Inglês | MEDLINE | ID: mdl-19688089

RESUMO

Continued production of food in areas predicted to be most affected by climate change, such as dairy farming regions of Australia, will be a major challenge in coming decades. Along with rising temperatures and water shortages, scarcity of inputs such as high energy feeds is predicted. With the motivation of selecting cattle adapted to these changing environments, we conducted a genome wide association study to detect DNA markers (single nucleotide polymorphisms) associated with the sensitivity of milk production to environmental conditions. To do this we combined historical milk production and weather records with dense marker genotypes on dairy sires with many daughters milking across a wide range of production environments in Australia. Markers associated with sensitivity of milk production to feeding level and sensitivity of milk production to temperature humidity index on chromosome nine and twenty nine respectively were validated in two independent populations, one a different breed of cattle. As the extent of linkage disequilibrium across cattle breeds is limited, the underlying causative mutations have been mapped to a small genomic interval containing two promising candidate genes. The validated marker panels we have reported here will aid selection for high milk production under anticipated climate change scenarios, for example selection of sires whose daughters will be most productive at low levels of feeding.

Assuntos

Cruzamento , Bovinos/genética , Mudança Climática , Estudo de Associação Genômica Ampla/veterinária , Animais , Austrália

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA